Improving Minority Class Prediction Using Case-Specific Feature Weights

نویسندگان

Claire Cardie

Nicholas Nowe

چکیده

This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an information-gain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibits poor performance on minority class instances. We then present two CBL algorithms designed to improve the performance of minority class predictions. Each variation creates test-case-speciic feature weights by rst observing the path taken by the test case in a decision tree created for the learning task, and then using path-speciic information gain values to create an appropriate weight vector for use during case retrieval. When applied to the NLP data sets, the algorithms are shown to signiicantly increase the accuracy of minority class predictions while maintaining or improving overall classiication accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Minority Class Prediction Using Case-Speci c Feature Weights

This paper addresses the problem of handling skewed class distributions within the case-based learning (CBL) framework. We rst present as a baseline an informationgain-weighted CBL algorithm and apply it to three data sets from natural language processing (NLP) with skewed class distributions. Although overall performance of the baseline CBL algorithm is good, we show that the algorithm exhibit...

متن کامل

A Minimum Risk Metric for Nearest Neighbor Classification

nale. Retrieval in a prototype-based case library: A case study in diabetes therapy revision. CH97] C. Cardie and N. Howe. Improving minority class prediction using case-speciic feature weight. CS93] Scott Cost and Steven Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. DP97] Pedro Domingos and Michael Pazzani. On the optimality of the simple bayesian clas-si...

متن کامل

Improving the Quality of Minority Class Identification in Dialog Act Tagging

We present a method of improving the performance of dialog act tagging in identifying minority classes by using per-class feature optimization and a method of choosing the class based not on confidence, but on a cascade of classifiers. We show that it gives a minority class F-measure error reduction of 22.8%, while also reducing the error for other classes and the overall error by about 10%.

متن کامل

A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique

Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents. The accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study, a new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance of identifying cancerle...

متن کامل

Evaluation of Classifiers in Software Fault-Proneness Prediction

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

Improving Minority Class Prediction Using Case-Specific Feature Weights

نویسندگان

چکیده

منابع مشابه

Improving Minority Class Prediction Using Case-Speci c Feature Weights

A Minimum Risk Metric for Nearest Neighbor Classification

Improving the Quality of Minority Class Identification in Dialog Act Tagging

A Two-Step Feature Selection Method to Predict Cancerlectins by Multiview Features and Synthetic Minority Oversampling Technique

Evaluation of Classifiers in Software Fault-Proneness Prediction

عنوان ژورنال:

اشتراک گذاری